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LOCALIZED DENSITY MATRIX MINIMIZATION AND LINEAR SCALING 

ALGORITHMS 

RONGJIE LAI AND JIANFENG LU 


Abstract. We propose a convex variational approach to compute localized density 
matrices for both zero temperature and finite temperature cases, by adding an entry- 
wise f i regularization to the free energy of the quantum system. Based on the fact that 
the density matrix decays exponential away from the diagonal for insulating system or 
system at finite temperature, the proposed (\ regularized variational method provides 
a nice way to approximate the original quantum system. We provide theoretical analy¬ 
sis of the approximation behavior and also design convergence guaranteed numerical 
algorithms based on Bregman iteration. More importantly, the (i regularized system 
naturally leads to localized density matrices with banded structure, which enables us to 
develop approximating algorithms to find the localized density matrices with computa¬ 
tion cost linearly dependent on the problem size. 


1. Introduction 

Efficient calculation of the low-lying spectrum of operators plays a central role in 
many applications. In particular, in the context of electronic structure theory, given a 
discretized effective Hamiltonian (such as the current iterate in a self-consistent itera¬ 
tion), the goal is to obtain the density matrix corresponds to the number of electrons. 
For zero temperature, the density matrix is the projection operator onto the low-lying 
eigenspace; for finite temperature, the density matrix is given by the Fermi-Dirac func¬ 
tion acting on the Hamiltonian [20l[29l . 

In this work, we extend the variational approach for localized density matrix in our 
previous work [17] to finite temperature, by adding entrywise f: \ penalty to the free en¬ 
ergy of the quantum system (which, in the context of density functional theory, corre¬ 
sponds to the linear version of the Mermin functional [23]). We also theoretically show 
that the proposed localized density matrix approximates, via the Frobenius norm, the 
true density matrix linearly depends on the regularization parameter 1 Itj. In addition, 
convergence guaranteed numerical algorithms are also designed to solve the proposed 
problems based on Bregman iteration. 

More importantly, this paper focuses on efficient algorithms to minimize the vari¬ 
ational problem for localized density matrix both at zero and finite temperature. In 

The research of J.L. was supported in part by the Alfred P. Sloan Foundation and the National Science 
Foundation under award DMS-1312659. The authors would like to thank Stanley Osher and Vidvudz Ozolins 
for their encouragements and helpful discussions. 


1 




2 


RONGJIE LAI AND JIANFENG LU 


particular, we develop linear scaling algorithms such that the computational cost scales 
linearly as the dimension of the matrix. The key idea is to exploit the decay of the den¬ 
sity matrix away from the diagonal, the £ \ regularized localized density matrices enable 
us to approximate the original variational problem by restricting to banded matrices. 

Linear scaling algorithms have been a focused research direction in electronic struc¬ 
ture calculation since 1990s. Closely related to our context is the density matrix mini¬ 
mization (DMM) algorithms, first introduced by Li, Nunes, and Vanderbilt ITS] and Daw 
[6], and have been further developed since then, see e.g., the reviews [3] (Ill ■ These al¬ 
gorithms are based on the fact that for insulating system or systems at finite tempera¬ 
ture, the density matrix decays exponentially away from the diagonal (see e.g., fIH7llT5lL 
Thus, one may require the density matrices to satisfy prescribed sparsity structures, 
such as banded matrices for ID problem. As a result, the degree of freedom and the 
computational cost becomes linear scaling. Another closely related class of methods 
is the purification method of density matrix (see e.g., BTIl22l[24l[28l and a review [25)). 
Unlike the approach of DMM and that we take in this work, these methods are not vari¬ 
ational. 

We emphasize a crucial difference between our approach and the previous works: 
Our variational problem is still convex even after truncation! This is in stark contrast to 
the previous formulations where the convexity is lost by using purification (22j or other 
techniques to approximate the inverse of density matrix 0 ] . This loss of convexity often 
introduces local minimizers to the variational problem and also issues of convergence. 
We note that even when the £\ regularization is dropped from our variational problem, 
it is still convex and different from the standard DMM algorithms. In fact, it would be of 
interest to explore this convex formulation, which will be considered in our future work. 

The rest of the paper is organized as follows. In the next section, we introduce the 
variational principles for localized density matrix for both zero and finite temperature 
cases, with their approximation properties. In Section[3] we introduce the Bregman iter¬ 
ation type algorithms to solve these minimization problems. Linear scaling algorithms 
are discussed in Section 0 ] We validate the algorithms through numerical examples in 
Section[5] Some conclusive remarks are discussed in Section[6| 


2. Localized density matrix minimization 


In this work, we will consider a family of energy functionals with parameters (i and rj: 



( 1 ) 


where ||| ■ j|| denotes the entrywise £\ norm of a matrix and H and P are n x n symmetric 
matrices, which are respectively the (discrete) Hamiltonian and density matrix in the 
context of electronic structure calculation. Here f3 is the inverse temperature and i) is 
the parameter for the £ \ regularization. 
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Recall that the starting point of (171 is the convex variational principle for density 
matrix 


(2) 


min SooooiP) - min tr {HP), 

PeR”*" ’ PeR"*« 

s.t. trP = N,P = P T ,0<P<I, 


where the notation A<B denotes that B - A is a symmetric positive semi-definite ma¬ 
trix. Note that the constraint 0 < P < I is the convexification of the idempotency con¬ 
straint P — P 2 , which gives the same minimizer for non-degenerate problems (see e.g., 
fl7l Proposition 1]). Indeed, denote {A i,(pi}" =1 the eigenvalue and eigenvector pairs of 
H with the assumption A n < Ajv+i, the solution of {2} is given by 


N 


(3) 


Po o,oo - (pi^Pi > 

(=1 


the projection operator on the subspace spanned by the first N eigenvectors. 

The variational principle |2} corresponds to the physical zero temperature case (and 
hence the inverse temperature p - oo), for finite temperature, we minimize the free en¬ 
ergy 


(4) 


min (fflooCP) = min tr {HP) + -trjpinP+ (1 -P)ln(l -P)\ 

PeR"*« P-°° v p eR «xn pi I 


s.t. trP m N, P = P , 0 < P < I, 
where we have used the Fermi-Dirac entropy 

(5) (p{x)-x lnx+(1 - x)ln(l - x), x£[0,l]. 

Note that 

(6) tp'ix)- lnx-ln(l-x), 

(7) (p"{x) = x _1 + (l-x) _1 , 

and hence <p"{x) > 4 for x £ [0,1], Therefore, trcp[P) is strictly convex with respect to P 
and the minimizer of d exists and is uniquely given by 


( 8 ) 


i-i 


P 13,00= 1 + exp ()8(H-/i)) = 

L J i=i 


where pi is the occupation number of the /-th eigenstate, given by 

1 


Pi 


£ [ 0 , 1 ], 




1 + exp(P(A,- - p)) 

and p is the Lagrange multiplier for the constraint tr P — N, known as the chemical po¬ 
tential. It is determined by 

n i 

y - = n. 

{r 1 l + exp(P(A i-p)) 

For a fixed Hamiltonian matrix, p is then a function of fi. It is not difficult to see that 

lim p{fi) = i(A N + A N +i), 

f3—oo Z 


( 9 ) 
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which lies in between the highest occupied and lowest unoccupied eigenvalues. 

In [IT), the following £1 regularized version of the variaional principle { 2 } is proposed 
as a convexified model for compressed modes introduced in [27). 

1 

Poo,ri - arg min goon - mm tr(.HP) + -|||P|||i, 

' per"-" ' Pe««»" 77 

s.t. trP - N, P - P T , 0< P < I, 


where ||| ■ Hi is the entrywise £\ matrix norm and 77 is a penalty parameter for entrywise 
sparsity. The variational principle provides Poo,r\ as a sparse representation of the pro¬ 
jection operator onto the low-lying eigenspace. Numerical experiments in El demon¬ 
strate the localization of Poo,r\ for the above £ \ regularized model. 

For the finite temperature, applying the same £\ regularization to enhance sparsity, 
we arrive at 


( 11 ) 


min g Bt ,(P)= min tr(HP) + - trjpinP + (1 - P)ln(l - P)\ + -|||P|lli 

PeR nx,! ' p eR"x" p l J 77 

s.t. trP - N, P - P T , 0<P<I. 


Since this variational principle is strictly convex, the minimizer exists and is unique. 

We will call the minimizing density matrix obtained by the above variational prin¬ 
ciples the localized density matrix (LDM). We provide in the remaining of this section 
some approximation results of LDM as the parameter Tj —>• 00. 


Theorem 1. Assume H is non-degenerate that A,y < Ajv+i and denote Poo ,00 the mini¬ 
mizer of { 2 ). Let Poo, rj be a minimizer of i fTol . we have 


(a) 0 < <000,00 (Poc,17) — ^00,00 (Poc.oc) — — III 7*00,00 III 1 - 

V 


(b) \\Poo, v -P< 


||2 „ 2 IIIPoo.oollll 


00,00 ll p - 


V (Ajv + i - Aiv) 


In particularly, lim goo.ooiPoo.ri) = <?oo,oo(-P00,00), and lim 

Tj —*00 TJ—>00 

Remark. Recall that the minimizer of fTOt might not be 
theorem applies to any minimizers. 


|| P 00,77 ^ 00,001| p 0- 

unique El, nevertheless the 


Pi oof. It is clear that ^00,00(^00,00) — ^00,00(^00,77) Poo, 00 minimizes the energy & 00,00 ■ 
On the other hand,since P^ is an optimizer of 1 1 01 , we have, 


goo,oo{Poo,ri) — goo,oo(Poo,r /) + — III ^oo, Tj III 1 - <?oo,oo(T’oo.oo) + — III Poo,oo III 1 


which yields the statement (a). 

To show (b), we denote {A ;,(/>;}” =1 the eigenpairs of H and recall that 

Poo,oo — V f i f j ■ 
l<i<N 


Moreover, as P^ is symmetric and hence diagonalizable, we denote {< 7 ;, f ;}” =1 its eigen¬ 
pairs, with Poojj i>i = 0 i Vi. Note that = N and (Tj e [0,1] since Poo.rt satisfies the 
constraint of ITT)1 . 
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Using the property of trace, we calculate 

n n n 

tr(//Poo >7? ) = 7 HPcQjjfPi) = 7 i&if Poo.n<Pi) = * 7 hjSj, 

i= 1 7=1 7=1 

where the last equality defines the shorthand notation Si = {(pi,Poo,rj<l>i)- Since 0 < 
Poo, 7 ] ^ I and trPoo >77 = N, we have 

n n 

0 <Si< 1, and 7 s i = H <0/, £ 00,77 07 > = trPoo ,77 = N. 

1=1 7=1 

We now estimate, based on these properties of {$*}> 

— III Poo,00 III 1 — <^00,00 (Poo, 77) — ^00,00(^00,00) 

17 


N N 


- E *7 *7 ^ H ^ 


7=i 


j=N+l 


7 s j 


7=1 7=1 

N n N 

- h-N (Sj - 1) +-W+1 X! Sj = (Ajv+i - Ajv) £ (1 - s 7)- 
7=1 7=jv+i 7=1 


This yields 
( 12 ) 

We complete the proof of (b) by 


^ 1 III-Poo,oo 111 1 

N-Y,SJ=Y I 0.-S])<-- 
7=1 7=1 


17 (Ajv +1 - Ajv) 


(£00,77 Poo,00 II/7 — tr 


((£>00,77 Poo, 00) j 

= - 2 tr(P 0 o, 00 Poo,ij) + trC^oo) 

^ tr(P 00fl )-2tr(P 00 

,00 P 00,7 1 ) + tr(Poo )00 ) 

n 

— 27V — 2 ^ ((pitPoo.ooPoo^tpi) 

i '=1 

N 


= 2 ( 7 V-I>j)<- 

1 jTi J " 


2 III 7^00,00 lilt 


17 (Ajv+i - Ajv) 


□ 


For the finite temperature case, we have the following analogous result. 

Theorem 2. Denote Pp :00 and Pp^ the minimizer of 0) and fill respectively, it holds 


(13) 

and a/so the estimate 


Q^&P,oo(Pp,r])-&p,oo(Pp,oo) £ ~ III P p,oo III 1 > 


(14) 


trfmaxf/S-LlH-juDtP^-P/,^,) 2 ) < - III ^oo III i - 
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Remark. Note that as immediate consequence of fl4t , we have the estimate in Frobe- 
nius norm 

(15) II Pp, v - Pp,oo II p ^ ~ III P/3,oo III i min(/?, max | A,- - p.\ ” 1 ). 

V 1 

Taking the limit fi —>■ oo, as the chemical potential fi—> : (/l,v + /l,v+i), we get 

. i, | Ajv+i-Ajv 

lim min A,- - u\ —*-. 

p~co i 1 2 

Therefore, we recover the estimate for zero temperature case (assuming Ajv < A/y + 1 ). 
Proof. By optimality of Pp it] and Pp jOQ , we have &p t00 {Pp j00 ) < £p i00 (Pp iV ) and 

(16) -H|P/l,collll > epfiolPprf-SptolPpflo) 

= tr [(Pp, v - Pp, oo)H) + tr[(p{Pp fT] ) - <p(Pp,oo)). 

Hence, we obtain the first conclusion fi~3l of the theorem. 

Recall that cp'{x) — lnx - ln(l - x) and hence by explicit calculation using <(TT} 

<p\Pp,oo) =ln(^U r --P^)“ 1 ) = - j3(H-/i). 

Therefore 

tr [{Pp, v -Pp,oo)H) = tr(( Pp itl -P Piao KH-n)) = -f>~ Y tr(q/(P Pi00 )(P pjl -Pp i00 )), 

where we have used that tr Pp^ = trPp i00 — N in the first equality. Substitute into fl6l . 
we get 

(17) HII^,oollli a r 1 [tr(v(P M ) -<piPp,oo)) -tiitp'lPpeoHPfo -P/3,co))] 

Note that the right hand side is the Bregman divergence of <p. For x,y e [0,1], we have 

(p{x) -q>[y)~ cp'[y){x- y) — x(lnx-lny) + (1 - x)(ln(l - x) - ln(l - y)), 

which is the Fermi-Dirac relative entropy. Following a similar calculation in [14 Theo¬ 
rem 1] (see also an improved version in (9)), we minimize x e [0,1] for fixed y and find 

, x , 1 - x l n o 

xln —I- (1 - x)ln - >—Wx-y) 2 . 

y 1 -y l-2 y 

Explicit calculation verifies that for y e [0,1], we have 

lni T (I, l-y, , 

Hence, combining with IT71 and using Klein’s lemma (30] Theorem 2.5.2], we arrived at 
the estimate QU 

^ III f/ 3 ,oo III l ^ tr (max(/3 _1 , | H - ju|) ( Pp, n - Pp.oof) ■ 

□ 
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3. Numerical algorithms for LDMs 


3.1. Bregman iteration for LDMs. In [17], a numerical algorithm has been proposed 
to solve m based on Bregman iteration. Bregman iteration was first introduced into 
information science for solving total variation related problems as an analog of “adding 
back the noise” in image denoising ||26]. Split Bregman iteration has been later proposed 
in [13] based on the idea of variable splitting. These algorithms have since received in¬ 
tensive attention due to its efficiency in many £\ related constrained optimization prob¬ 
lems l34ll35l . The equivalence of the Bregman iteration with the alternating direction 
method of multipliers (ADMM), Douglas-Rachford splitting and augmented Lagrangian 
method can be found in I5ll8l l33li35l . 

Let’s first recall the algorithm proposed in fT71 . By introducing auxiliary variables Q 
and R, the optimization problem flOl is equivalent to 


(18) 


min — III Q III i +tr(HP) 

P,Q,Re R" xn TJ 


s.t. Q - P, R - P, ti P - N, 0 < R < I, 


The method of Bregman iteration suggests to approach fj~8l by solving: 

(19) (P k ,Q k ,R k ) = arg min HUQUl! +tpHP) 

P,Q,Re R' !x '> 77 

+ ^\\P-Q + B k - l \\ 2 F + r -\\P-R + D k - 1 \\ 2 F 
s.t. trP - N, 0 < R < I, 

(20) B k = B k ~ 1 + P k -Q k , 

(21) D k = D k ~ 1 + P k -R k , 


where variables B, D are essentially Lagrangian multipliers and parameters r, A control 
the penalty terms. Solving P k , Q k , R k in fTTil alternatively leads to algorithmQ] proposed 
indZ). 


1 Initialize Q° - R° - P° e^ - {P el 

2 while “notconverge" do 


P = P J ,trP = N,0<P< I], B° - D° - 0 


r k.. r k tr(T K )-N 


where T = 


A -(Q k ~ 1 -B k ~ 1 ) + - T 


■{R k ~ l - D k ~ l ) - ■ 


-H. 


A+r A+r A+r 

Q k = Shrink( P k + B k ~\ —} = sign(P fc + B k ~ l ) maxi \P k + B 




Xrj 


k-l 1 


R k = l/min{max{A,0}, ljV 1 , where [17, A] = eig {P^ + D K 1 ). 


fc-ti 


A 77 


,0 


B k = B k -1 + p k _ Qk 

D k = D k ~ 1 + P k -R k . 


Algorithm 1: Zero temperature localized density matrix minimization 
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Similarly, the LDM for the finite temperature case proposed in ITT] can also be solved 
based on Bregman iteration. By introducing auxiliary variables Q and R, the optimiza¬ 
tion problem {TlJ is equivalent to 

1 1 ( 1 

min - HI Q HI i + tr (HP) + - tr {PlnP + (1 - P) ln(l - P) \ 

^ 22 ) P,Q,RaR nxn T] p 1 > 

s.t. Q = P,R = P,trQ = N,0<R<I. 


Note that we have explored the flexibility of the augmenting approach to impose the 
trace constraint on Q, which is equivalent of imposing the constraint on P. The mini¬ 
mization QD can be iteratively solved by: 

[P k , Q k ,R k ) - arg min -|||Q|||i + tr(HP) + - trjpinP + (1 - P)ln(l - P)\ 
P.Q.Re u nxn T] /3 l I 

+ ^\\P-Q + B k - 1 \\ 2 F +'-\\P-R + D k - 1 \\ 2 F 

s.t. trQ = N,0<R<I, 

B k = B k ~ l + P k — Q k , 

D k = D k - 1 +P k -R k , 


where variables B, D and parameters r, A have the similar roles as in 1191 . Solving P k , Q k ,R k 
in (23) alternatively leads to the following three sub-optimization problems. 

1 /L 

1° P k = arg p min n tr(HP) + ^ tr jpinP + (1 - P) ln(l-P)} + j\\P-Q k ~ 1 + B k ~ l f p 


PeR"* 

ifc-l 


" t_1 |||. 


+ -|| P-R^'+D 
2 

2° Q k = arg min - 1|| Q||| l + \ || P k - Q + B 
Qe r"*' 1 t) 2 


k—l m 2 
F< 


s.t. txQ-N. 


s.t. 0<R< I. 


3° R lc = arg min \\P k - R + D fc_1 || 2 P , 

b ReR nxn F 

Note that the sub-minimization problem 3° of algorithm[ 2 ]can be solved explicitly, sim¬ 
ilarly as before 


R k = Vmin{max{A,0},l}^ J , where [V, A] = eig(P K + D k ~ l ). 


To solve the sub-problem 2° of algorithm^ let’s denote by Q 0 the off-diagonal part of 
Q and write Qd as the diagonal vector of Q. Namely, we have Q — Q„ + diag(Qd). We 
also use similar notations for all other matrices. Then the solution of the sub-problem 
2° can be written as Q k — Q k + diag(Q^ ; ), where Q k and Q k f are given by (denoting M k = 
P k + B k ~ 1 ) 

(24) Q k = Shrink(M^,(A 77 ) _1 | = sign(Af*)max{| m*| - (At 7 ) _1 ,o| , 

(25) Q k = argmm-\\Q d \\ 1 + ^\\Q d -M k \\ 2 , s.t. 1 T Q d = N. 

here, 1 is a n x 1 vector with all elements 1. Note that (25] is a convex optimization prob¬ 
lem with size n, which can be efficiently solved using the following Bregman iteration. 
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Q k ’ s - Shrink 


A + r 


-Mi 


A + r 


(V s - 1 -b*- 1 ), 


1 


(A+ r )?7 


(26) 


v s = Q k / + b s ~ l - - (l r (Q*' s + b^ 1 ) - Af]. 


b , = b s ~ 1 + Q%’-v s . 


Next, we propose to iteratively solve the sub-minimization 1° in algorithm^ 


(27) P = arg min tr {HP) + 
& Pe r»» /3 


— tr|pinP + (1 — P)ln(l - P) j + — ||P - Q fc_1 + B fc_1 


— 1 H 2 

F 


P-R k ~ l +D k ~ l |||. 


Note that this is a convex problem, and hence the existence of uniqueness of P k is guar¬ 
anteed. By the KKT condition, P k satisfies 


1 


(28) H+- (In P K - ln(l - P K )) + A (P K - Q k ~ l + B k ~ l ) + r (P K - R K ~ l + D K ~ L ) = 0. 
Equivalently, we may write the above equation as 

(29) P k = [l + exp[/3(H + A(P fc - Q k ~ l + B k ~ l ) + r(P k - R k_1 + D k ~ l ) 

Thus a natural iterative scheme to solve for P k is given by 

1 


-yk—\ , r\k —lv 


-1 


(30) 

Z l+1 

(31) 

Y l = 


1 + exp {f3Y l ) ’ 

r + A (Z l - Q k ~ l + B k ~ l ) + r(Z I - R k ~' + D k ~ l ). 

The following proposition gives the convergence of the above scheme. 

Propositions. Assume /3(A+ r) <4. Given any symmetric matrices Q, R, B, D e K” x ", the 
iteration scheme 

Z l+1 = 1 


1 + expifY 1 ) ’ 

Y l = H + A (Z l -Q + B) + r{Z l -R + D) 


converges exponentially as l —► oo: 


(32) 


^ II *70 


\Z l -Z*\\ = (/3(A + r)/4) \\Z' 


where Z* is the unique solution of 1271 . 
Proof. Denote the Fermi-Dirac function 


1 + exp(jSx) 


We have then 


max 

X 


|^w| - 


max 

X 


/3exp(/3x) 


(l + exp (/lx)) 


</3l A. 
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Therefore, 

||z' + 1 -z*|| = ^ |f z -f*| = ^ (A 4 +,) |z'-z*|, 

where Y* - H + X(Z* - Q + B) + r{Z* - R + D). The proposition then follows by iterating 
with respect to l. □ 

In summary, we arrive at algorithm[ 2 ]for fTO . 


1 Initialize Q° = R° = P° e , B° = D° = 0 

2 while “notconverge" do 

3 while “notconverge" do 

4 Y k ’ 1 - H + A (Z M_1 - Q k ~ l + B k ~ l ) + r{Z k,l ~ 1 - R k ~ l + D k ~ l ). 

5 Z k ’ 1 = 1 _ 

1 + exp {/3Y k ’ 1 ) 

6 P k = Z k ’ 1 . 

7 Q k - Q k + diag(Q^), where Q k and Q k are given by (24) and J25) . 

s R k = T/min{max{A,0}, 1 } V T , where [V", A] = eig {P k + D k ~ l ). 

9 B k = B k ~ l + P k - Q k . 

o D k = D k ~ 1 + P k — R k . 

Algorithm 2: Finite temperature localized density matrix minimization. 


Remark. In practice, it is not necessary to require all inner iterations convergence in 
algorithm^ In our numerical experiments, we run all the inner iterations in algorithm 
[2]within a given small number of steps, as /J converges exponentially by Proposition^ 

Theorem 4 (Convergence of Algorithm[l]and Algorithm^ . 

(1) The sequence \(l >k , Q k , R k )}generated by algorithm\T\from any starting point 
converges to a minimum of the variational problem fTOl . 

(2) The sequence \ {P k , Q k , R k )} k generated by algorithm^ from any starting point 
converges to a minimum of the variational problem fill . 

Proof. The convergence of algorithm Q] is proved in [17;]. The proof in fact also applies 
verbatim to algorithm[ 2 ]as it is written for generic convex energy E(P). □ 

While the minimization problems (Tot and {Tl} are convex and the proposed algo¬ 
rithms converge to the minimizers by theorem[4] it is also clear that the computational 
efficiency of the algorithms Q] and G] is limited by the eigen-decomposition in the step 
of eigenvalue thresholding, and also the inner iteration in the finite temperature case. 
The computational cost for standard eigen-decompsition algorithm is T/{n ? ‘), which is 
rather expensive for large scale calculations. In the next section, inspired by ideas from 
linear scaling algorithms for electronic structure, we propose approximate algorithms 
to solve m and ITT) by replacing eigendecompsition with polynomial functions acting 
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on matrices. The resulting algorithms have computational cost linearly scaled with the 
matrix size n. 


4. Approximation by banded matrices and linear scaling algorithms 

Based on results indicated in theorems |T] and [2] the proposed LDM serves as a nice 
approximation of the true density matrix. More importantly, similar to many (. \ regular¬ 
ization methods developed for compressed sensing problems, the minimizing density 
matrices are expected to have certain sparse structure. More precisely, we denote the 
set of all banded matrices with band width w as 


(33) ® w = {P = ( ptj ) £ K nx " |P = P T , pij = 0, V j € , 


where Jf .denotes as a ^-neighborhood of i. In particular, for the ID examples con¬ 
sidered later in this paper, the neighborhood is chosen as 


jY™ — {j £ {1,2,...,«} | \i — j\ mod n < w}, 


for an given band width w £ {0,1,2, ■■ •, [-|]} (w is typically chosen much smaller than 
n/2). For example, consider a banded discretized Hamiltonian H (e.g., a central differ¬ 
ence discretization of Hamiltonian -\\+V), numerical results in [13 suggest that the 
LDM is banded with a small band width. We remark that, however, theoretical valida¬ 
tion of this observation is still open and remains to be investigated in future works. 

This motivates the following variational problems to approximate the LDMs pro¬ 
posed in fTOl and fill by simply constraining the problems on the set of banded ma¬ 
trices. 


(34) 


mmQtP) = „rnin tr(fTP) + — lll^llli. 


p eK ;ixn 


PeR" 


i.t. trP - N, P - P T , 0< P < I, P £, 


(35) 


min SaAP) - mm 

p eR nxn P,V p eK « 


i tr(HP) + - tr jpinP + (1 - P)ln(l - P)\ + - 
xn ft I >n 


s.t. trP - N, P - P T , 0< P < I, P £ SS W . 

Note that the above two optimization problems are still convex. This is in contrast to the 
usual minimization problems developed in the literature of linear scaling algorithms 
01122 ]. 

The advantage of considering banded matrix is that it allows for linear scaling algo¬ 
rithms. Let us consider first the eigenvalue thresholding: 

(36) R- l^min{max{A,0}, \}V T , where [V, A] =eig(A4). 

Observe that the above formula can be written as a matrix function 

(37) R-h{M), where h(x) — min{max{x,0}, 1} = min{l/2(|x| + x), 1}. 




12 


RONGJIE LAI AND JIANFENG LU 


The standard evaluation of the matrix function h using spectral theory needs diagonal- 
ization as in |36). The key idea is to approximate general matrix functions by polyno¬ 
mials, as the polynomials acting on matrix only involves products and sums, which can 
take advantage of the bandedness of the matrix. This type of algorithms has been ex¬ 
plored extensively in the literature of linear scaling algorithms (see [T2] and the review 
articles (3l [TTl[25l l). 

More specifically in the current context, we will approximate the hard thresholding 
function h{x) - min{l/2(|x| +x), 1} using the Chebyshevpolynomial approximation [32 1. 
Recall that the standard Chebyshev polynomials approximation minimizes the L°° error 
on the interval [-1, 1], hence, we will first rescale the matrix such that its eigenvalue lies 
in the interval. 

For this, we first use the power method to estimate the largest eigenvalue A max and 
the smallest eigenvalue A m ; n of M. An affine scaling of M gives s{M), whose eigenvalue 
is in [-1,1], where 

s{M) = 1- —} -W*-Amin) - 1. 

^max /'-■min 

Note that its inverse is given by 

-1,, _ A max — A m in A m i n + A max 
5 (M) =-M +-. 

2 2 


We approximate 

h(M) = fc°s _1 (s(M)) ~ r™ r (s(M)), 

where T™ t is the Chebyshev polynomial approximation of ho s -1 on [-1,1]. Figure[7ja) 
illustrates the Chebyshev polynomial approximation of h (i.e., ho s -1 assuming the scal¬ 
ing function s is identity) by 40 degree polynomials. Higher degree polynomial is needed 
if M has a larger spectrum span. In our numerical tests, it seems that fixing the degree 
be 40 gives satisfactory result for the test examples. 

As acting T™ t on s[M) involves matrix products which will increase the matrix band¬ 
width. The resulting matrix is projected (with respect to Frobenius norm) on the space 
3B W to satisfy the constraint. This projection is explicitly given by the truncation opera¬ 
tor ‘X w : U nxn — ► 3B W which sets all matrix entries to 0 outside the if-band. 

To sum up, the eigenvalue thresholding can be approximated by algorithm [3] For 
banded matrix with band width w, the computational cost of this algorithm is 0[nm 2 w) 
which scale linearly with respect to the matrix dimension n. Therefore, for matrices in 
9B lv , this algorithm has much better computational efficiency in particular for matrices 
of large size. 

Replacing the eigenvalue thresholding in algorithm [4] by algorithm [3] we obtain the 
following algorithm for computing LDM in the zero temperature case. We remark that 
all the steps in algorithm 0] preserves the bandedness of the matrices: For the approx¬ 
imate eigenvalue thresholding, this is guaranteed by the explicit projection step; it can 
be easily checked for the other steps. Hence, the iterates of the algorithm ( Q,R,P,B,D ) 
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1 P — EigenThresviaChebyPoly(i?m) 

Input: P e SS W , m 

Output: P e 3B W 

2 Estimate A max and A m i n of P using the power method. 

3 Compute T™ t {x) as the Chebyshev polynomial approximation of 

Jl | >t m ax~^min _j_ 't m j n + A max j 

4 Pnew—~: : [P ~ Amin) - 1. 

Amax — zlmin 

5 P — C £ W T™ T (P new ). 

Algorithm 3: A linear scaling algorithm for eigenvalue thresholding. 


1 Initialize Q° = R° - P° e^f]38 W ,B° - D° = 0, choose m et . 

2 while “not converge" do 


nk u tr(T fc ) - Af , , 

P k - -, where E - 


A + r 


(Q fc_1 - B k ~ l ) - 


(Rb- 1 - D K ~ l ) - 


fc-i A 


A + r 


1 


A + r 


-H. 


Q k = Shrink ^ — j. 

R k = EigenThresviaChebyPoly(P fc + D k ~ l ,m et ). 
B k = B k ~ l +P k -Q k . 

D k = D k ~ 1 +P k -R k . 


Algorithm 4: A linear scaling algorithm for computing LDM at zero temperature. 


will remain in 3S W as the initial condition lies in the set. Thus, the whole algorithm is 
linear scaling. 

For the finite temperature case, we need a further approximation for evaluating the 
Fermi-Dirac matrix function (1+exp (/IF)) -1 , as direct computation also involves matrix 
diagonalization, which is 0{n 3 ). Following the same procedure as in algorithmic we 
can achieve a linear scaling algorithm by a Chebyshev polynomial approximation of the 
Femi-Dirac function <pf}{x) - (1 + exp (/lx)) -1 . 

This leads to an algorithm FermiDiracviaChebyPoly(F, m) for approximating the Fermi- 
Dirac operation by simply replacing the hard thresholding function h{x) with the Fermi- 
Dirac function (pp(x) in algorithmic Hence, we omit the details. l ; igure[7Jb) shows the 
approximation of (f)fj(x) for /) = 10 with a degree 40 polynomial. Nice agreement is ob¬ 
served. We remark that if the temperature is lower (so that /) is larger), a higher degree 
polynomial is needed as the function (pp (x) has larger derivatives. In fact, as /) oo, 
<pp converges to a Heaviside function with jump at x = 0. Thus, we arrive at the follow¬ 
ing algorithm [C for computing EDM of the finite temperature case with linear scaling 
computation cost with respect to the matrix size n. 
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Figure 1. (a): Chebyshev approximation of h(x) with m = 50 degree 
polynomial, (b): Chebyshev approximation of for p — 10 with 
m — 40 degree polynomial. 


i Initialize Q° = R° - P° e Sg’f] SB W ,B° — D° - 0, choose rrifd, m et 
i while “notconverge" do 

3 while “notconverge"do 

4 Y k ’ 1 -H + \{Z k ’ l ~ l - Q k ~ l + B k ~ l ) + r{Z k ’ l ~ l - R k ~ l + D k ~ l ). 

5 Z k ' 1 = FermiDiracviaChebyPoly(y fc,i , nifa). 

6 p k = Z k ’ 1 . 

7 Q k = Q k + diag(Q^), where Q k and Q k are given by (24) and (25) . 

s R k = EigenThresviaChebyPoly(P fc + D fc_1 > me f ). 

9 b k = B k ~ l + P k - Q k . 

10 d k = D k ~ 1 +P k -R k . 

Algorithm 5: A linear scaling algorithm for computing LDM at finite temperature. 


We remark that since polynomial products are applied to approximate the hard thresh¬ 
olding function and the Femi-Dirac operation, approximation errors have been intro¬ 
duced in each iteration of the proposed algorithms |4]and[5] Therefore, the convergence 
proof used in theorem [4] as we discussed in (T7J can not be directly applied, although 
our numerical results in Section [5] illustrate satisfactory approximation to model[2]and 
model [~0 Note that similar issues arise in theoretical understanding of convergence of 
other iterative linear scaling algorithms, for example [ 10 ]. 





















LOCALIZED DENSITY MATRIX MINIMIZATION AND LINEAR SCALING ALGORITHMS 


15 


5. Numerical Experiments 

In this section, numerical experiments are presented to demonstrate the proposed 
models and algorithms for LDM computing for at zero and finite temperatures. We con¬ 
duct numerical comparisons of our results between LDM computation with and with¬ 
out the linear scaling algorithms, which indicates satisfactory results of the proposed 
linear scaling algorithms based on approximation by banded matrices. We further illus¬ 
trate efficiency of the proposed linear scaling algorithms. All numerical experiments are 
implemented by M ATLAB in a PC with a 16G RAM and a 2.7 GHz CPU. 

Banded Energy gap 

potential function a2 | ' ' ' ' ' : 

o * • 

- 0.2 .. 

-0.4 

- 0.6 

0 10 20 30 40 50 60 70 80 90 100 J 

°' 8 0 5 10 15 20 25 30 



(a) (b) 

Figure 2. (a): The potential function in the modified Kronig-Penney 
model, (b): The spectrum of the (discretized) Hamiltonian operator. 

In our experiments, we consider the proposed models defined on ID domain D = 
[0, 100] with periodic boundary condition. In this case, the matrix H is a discretization 
of the Schrodinger operator |A + V defined on Q. Here, A is approximated by a cen¬ 
tral difference on [0, 100] with equally spaced 400 points. In addition, we consider a 
modified Kronig-Penney (KP) model [16j| for a one-dimensional insulator. The original 
KP model describes the states of independent electrons in a one-dimensional crystal, 
where the potential function V (x) consists of a periodic array of rectangular potential 
wells. We replace the rectangular wells with inverted Gaussians so that the potential is 
given by 

Af a t 

(38) V{x) - -V 0 ]T exp 

;=i 

where N a t gives the number of potential wells. In our numerical experiments, we choose 
A/at = 10 and xj - 100 //11 for j - 1 ,...,JV at . The potential is plotted in Figure [2}a). 
For this given potential, the Hamiltonian operator H - -1A + V (x) exhibits two low- 
energy bands separated by finite gaps from the rest of the eigenvalue spectrum (See 
FigurelUb)). 

The first experiment demonstrates the proposed methods at zero temperature for 
an insulating case, where we set parameters as rj — 100, N = 10, m e t — 50. Figure [Ha) 
illustrates the true density matrix Zisisio <t>i<l>J obtained by the first 10 eigenfunctions 


[X-Xj ) 2 

8 2 
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{0,-}™ of II. Figure Ob) plots the density matrix obtained from the proposed model 
using algorithm!]] As one can observe from FigureOb), the LDM provides a quite good 
approximation to the true density matrix. Note that the LDM with a larger tj imposes a 
smaller penalty on the sparsity, and hence the solution has a wider spread and closer to 
the true density matrix. The approximation behavior is stated in theoremQ] We also re¬ 
fer our previous work \V7} for more numerical discussion about this point. FigureOc, d) 
illustrate computing results of LDM using algorithm[4]with band size w = 10,20 respec¬ 
tively. As we can see from the results, a moderate size of band width can provide satisfac¬ 
tory approximation for the LDM. In addition, we also conduct quantitative comparisons 
between algorithm [l] and algorithm [4] in table |T] where P and P w are results obtained 
from algorithm [T]and algorithm[4]respectively. The relative energy error 
and the comparable relative differences between and 11 - indicate that 

results obtained by linear scaling algorithm[4]can provide good estimation of the LDM 
created from algorithm!]] 

For the finite temperature case, it is known that the true density matrix decays expo¬ 
nentially fast along the off-diagonal direction. In the second experiment, we test the al¬ 
gorithms for the finite temperature model with potential free case and modified KP case. 
We set parameters tj = 100, /3 = 1, m et = 40, nifd = 20 for both cases. FigureEta) and Fig- 
ureEJa) illustrate the corresponding true density matrices described by L; pi<pi<pj using 
ID , whose non-zero entries concentrate on a narrow band along the diagonal direction. 
Figure [4jb) and Figure Elb) plot the density matrix obtained from the proposed model 
using algorithm^ It is clear that the LDM performs reasonably good approximation to 
the true density matrix. Figure Etc, d) and Figure Etc, d) illustrate computing results of 
LDM using algorithm[5]with band size w =10,20 respectively. As we can see from the re¬ 
sults, a moderate size of band width can provide satisfactory approximation for the LDM 
as the true density matrix has exponential decay property. In addition, we also conduct 
quantitative comparisons between algorithm[2]and algorithm[ 5 ]in table[H The relative 
energy error and the comparable relative differences between 

and indicate that results obtained by linear scaling algorithm[5]can provide 

satifactory estimation of the LDM created from algorithm^ 

In the third experiment, we test the computation cost of the proposed algorithms for 
approximating the eigenvalue thresholding and the Fermi-Dirac operation. Using dif¬ 
ferent matrix sizes, the EigenThresviaChebyPoly designed by algorithm [3] is applied to 
approximate the step of hard thresholding eigenvalues used in the algorithm |T] step 5. 
Similarly, we also test FermiDiracviaChebyPoly for different matrix sizes for the Fermi- 
Dirac operation. Red curves in Figure[6](a, b) report log-log curves of the average com¬ 
putation cost for both cases with respect to different matrix sizes, where blue curves 
illustrate the corresponding linear fitting for the computation cost curve. For banded 
matrices, It is clear that the computation costs for both approximation algorithms based 
on Chebyshev polynomials are linear scaling to the matrix sizes. 









Figure 3. Result comparisons of the zero temperature case with KP 
potential (n = 400 ,N = 10). (a). The true density function obtained 
by the first N eigenfunctions of H. (b). Solution of the localized den¬ 
sity matrix with rj = 100 using algorithm!]] (c), (d). Solutions of the 
localized density matrices (77 = 100) using algorithm |4] with w = 10,20 
respectively. 


We next report comparisons for the computation cost of all proposed algorithms, 
where we set parameters the same as those used in the first and the second experiments. 
We first conduct comparison between algorithm[l]and algorithm[4]for the zero tempera¬ 
ture case with matrix size chosen from 10 3 to 10 4 . Figure[7](a) reports the log-log plots of 
average time consumption of each iteration for both algorithms. From the slope of the 
corresponding linear fitting curves, it is clear to see that the original proposed algorithm 
|T]has computation cost cubically dependent on the matrix size, while the algorithmic 
based on the banded structure has computation cost linearly dependent on the matrix 
size. As we can observed from Figure[7](a), the linear scaling algorithm can significantly 
reduce the computation time when n is larger than certain moderate size. Similarly, we 
also conduct comparison between algorithm [2] and algorithm [5] for the finite tempera¬ 
ture case, where matrices size has been chosen from 10 2 to 2.5 x 10 3 . The slopes of the 
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Figure 4. Result comparisons of the finite temperature case with KP 
potential (n = 400,1V = 10, p = 1). (a). The true density function ob¬ 
tained by the first N eigenfunctions of H. (b). Solution of the localized 
density matrix with tj - 100 using algorithmic] ( c )» (d). Solutions of the 
localized density matrices (ij = 100 ) using algorithm[5]with w - 10,20 
respectively. 


corresponding linear fitting curves also indicate that the proposed algorithm[5]has lin¬ 
ear scaling dependent on the problem size, while the original algorithm [2] is cubically 
scaling to the problem size. Therefore, we can take the linear scaling advantage of the 
proposed algorithm[4]and algorithm[5]based on the banded structure for problems with 
large size. 


6 . Conclusion and future works 

This work extends our previous work [17 ] for constructing LDMs to the finite temper¬ 
ature case by adding an f: \ regularization to the energy of the original quantum system. 
As it has been shown that the density matrix decays exponential away from the diag¬ 
onal for insulating system or system at finite temperature, the LDMs obtained by the 
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Figure 5. Result comparisons of the potential free finite temperature 
case(n = 400, AT = 10, /3 = 1). (a). The true density function obtained 
by the first N eigenfunctions of H. (b). Solution of the localized den¬ 
sity matrix with 77 = 100 using algorithm [2] (c), (d). Solutions of the 
localized density matrices (77 = 100 ) using algorithm[5]with w - 10,20 
respectively. 


proposed convex variational models provide good approximation to the original den¬ 
sity matrices. Theoretically, we have conducted analysis to the approximation behav¬ 
ior. In addition, we also design convergence guarantteed numerical algorithms to solve 
the proposed localized density matrix models based on Bregman iteration. More im¬ 
portantly, by observing that the t \ regularized system can naturally create a localized 
density matrix with banded structure, we design approximate algorithms to find the 
localized density matrices. These numerical algorithms for banded matrices have com¬ 
putation complexity linear scaling to the matrix size n. 

This paper mainly focuses on proposing variational methods for LDMs, theoretically 
analyzing its properties and designing numerical algorithms. Thus, we only test some 
simple examples in the experimental part to illustrate the proposed modes and algo¬ 
rithms. In particular, we have taken a naive finite difference discretization. All codes are 
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MHP)-tt(HP w )\ 

\tt(HP)\ 

\efo(P)-£to(Pw)\ 

l<%r,OTI 

\\p-z w (pn F 

IIPIIf 

IIR-P^IIf 

IIPIIf 

zero temperature 
[P-oo,rj- 100, 
modified KP) 

w = 10 

1.00 X 10 1 

6.36 x 10“ 2 

1.46 x 10” 1 

2.32 x 10” 1 

w — 15 

4.30 x 10" 2 

5.28 x 10“ 2 

2.24 x 10" 2 

3.27 x 10" 2 

w = 20 

4.97 x 10" 3 

1.20 x 10“ 2 

1.01 x 10“ 3 

1.56 x 10" 2 

finite temperature 
(/3 = 1,77 = 100, 
modified KP) 

w= 10 

8.31 x 10" 3 

1.44 x 10“ 2 

3.86 x 10" 3 

8.46 x 10" 3 

w- 15 

5.87 x 10“ 4 

5.78 x 10“ 4 

2.51 x 10“ 4 

5.38 x 10“ 4 

w = 20 

9.40 x 10“ 4 

4.12 x 10“ 4 

1.36 x 10“ 4 

3.05 x 10" 4 

finite temperature 
(P= 1,77= 100, 
potential free) 

II 

H 

O 

2.13 x 10" 3 

2.78 x 10“ 2 

4.04 x 10" 3 

5.99 x 10" 3 

w - 15 

3.03 x 10" 6 

1.03 x 10“ 3 

2.66 x 10“ 4 

4.81 x 10“ 4 

w = 20 

1.42 x 10" 5 

6.82 x 10“ 4 

1.54 x 10“ 4 

2.35 x 10“ 4 


Table 1. Approximation error using banded matrices with 77 = 100 for 
all cases. 




(a) (b) 

Figure 6 . (a): Time consumption of EigenThresviaChebyPoly(P,40) 
with different sizes of matrix P and fixed band width w - 10. (b): Time 
consumption of FermiDiracviaChebyPoly(P,40) with different sizes of 
matrix P and fixed band width w = 10. 

implemented in MATLAB for test purpose and are not optimized. In our future work, 
we will further accelerate the computation from several aspects. First, we can discretize 
the system using better localized basis to reduce the problem size I2 IT91I3T1 . Second, we 
can improve the implementation for better computation performance. In addition, we 
will explore theoretical analysis for decay properties of the LDM in future works. 
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